Character-level Recurrent Text Prediction
نویسنده
چکیده
Text prediction is an application of language models to mobile devices. Currently, the state of the art models use neural networks. Unfortunately, mobile devices are constrainted in both computing power and space and are thus unable to run most (if not all) neural networks. Recently, however, character-level architectures have appeared that have outperformed previous architectures for machine translation. They are advantageous in that they do not require a word embedding matrix and thus require a lot less space. This project evaluates one recent character-level architecture on the text prediction task. We find that the network performs qualitatively well, as well as achieving perplexity levels close to existing methods on the Brown corpus.
منابع مشابه
Make Deep Learning Great Again: Character-Level RNN Speech Generation in the Style of Donald Trump
A character-level recurrent neural network (RNN) is a statistical language model capable of producing text that superficially resembles a training corpus. The efficacy of these networks in mimicking Shakespeare, Linux source code, and other forms of text have already been demonstrated. In this paper, we show that character-level RNNs are capable of very believably mimicking the language of Pres...
متن کاملRecurrent neural networks based Indic word-wise script identification using character-wise training
This paper presents a novel methodology of Indic handwritten script recognition using Recurrent Neural Networks and addresses the problem of script recognition in poor data scenarios, such as when only character level online data is available. It is based on the hypothesis that curves of online character data comprise sufficient information for prediction at the word level. Online character dat...
متن کاملSequence to Sequence Learning for Optical Character Recognition
We propose an end-to-end recurrent encoder-decoder based sequence learning approach for printed text Optical Character Recognition (OCR). In contrast to present day existing state-of-art OCR solution [Graves et al. (2006)] which uses CTC output layer, our approach makes minimalistic assumptions on the structure and length of the sequence. We use a two step encoder-decoder approach – (a) A recur...
متن کاملCharacter-level Convolutional Networks for Text Classification
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several largescale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and de...
متن کاملText segmentation with character-level text embeddings
Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of word tokens as input to linguistic analysis is often unjustified. For many languages word segmentation is a non-trivial task and naturally occurring text is sometimes a mixture of natural language strings and other character data. We propose to learn text representations dir...
متن کامل